Conference Proceedings

Learning Biological Sequence Types Using the Literature

Mohamed Reda Bouadjenek, Karin Verspoor, Justin Zobel

Association for Computing Machinery | Published : 2017

Abstract

We explore in this paper automatic biological sequence type classification for records in biological sequence databases. The sequence type attribute provides important information about the nature of a sequence represented in a record, and is often used in search to filter out irrelevant sequences. However, the sequence type attribute is generally a non-mandatory free-text field, and thus it is subject to many errors including typos, mis-assignment, and nonassignment. In GenBank, this problem concerns roughly 18% of records, an alarming number that should worry the biocuration community. To address this problem of automatic sequence type classification, we propose the use of literature assoc..

View full abstract

University of Melbourne Researchers